---
title: Predicting Customer Churn with MindsDB
sidebarTitle: Customer Churn
---

<Note>
This tutorial uses the Lightwood integration that requires the `mindsdb/mindsdb:lightwood` Docker image. [Learn more here](/setup/self-hosted/docker#install-mindsdb).
</Note>

## Introduction

In this tutorial, we'll create and train a machine learning model, or as we call
it, an `AI Table` or a `predictor`. By querying the model, we'll predict the
probability of churn for new customers of a telecoms company.

Make sure you have access to a working MindsDB installation, either locally or
at [MindsDB Cloud](https://cloud.mindsdb.com/).

If you want to learn how to set up your account at MindsDB Cloud, follow
[this guide](/setup/cloud/). Another way is to set up
MindsDB locally using
[Docker](/setup/self-hosted/docker/) or
[Python](/setup/self-hosted/pip/source/).

Let's get started.

## Data Setup

### Connecting the Data

There are a couple of ways you can get the data to follow through with this
tutorial.

<Tabs>
  <Tab title="Connecting as a database">
    You can connect to a demo database that we've prepared for you. It contains the data used throughout this tutorial (the `example_db.demo_data.customer_churn` table).

    ```sql
    CREATE DATABASE example_db
    WITH ENGINE = "postgres",
    PARAMETERS = {
        "user": "demo_user",
        "password": "demo_password",
        "host": "3.220.66.106",
        "port": "5432",
        "database": "demo"
    };
    ```

    Now you can run queries directly on the demo database. Let's preview the data that we'll use to train our predictor.

    ```sql
    SELECT *
    FROM example_db.demo_data.customer_churn
    LIMIT 10;
    ```

  </Tab>
  <Tab title="Connecting as a file">
    You can download [the `CSV` data file here](https://github.com/mindsdb/mindsdb-examples/blob/master/classics/customer_churn/raw_data/WA_Fn-UseC_-Telco-Customer-Churn.csv) and upload it via [MindsDB SQL Editor](/connect/mindsdb_editor/).

    Follow [this guide](/sql/create/file/) to find out how to upload a file to MindsDB.

    Now you can run queries directly on the file as if it were a table. Let's preview the data that we'll use to train our predictor.

    ```sql
    SELECT *
    FROM files.churn
    LIMIT 10;
    ```

  </Tab>
</Tabs>

<Warning>
**Pay Attention to the Queries** 
From now on, we'll use the
`files.churn` file as a table. Make sure you replace it with
`example_db.demo_data.customer_churn` if you connect the data as a database.

</Warning>

### Understanding the Data

We use the customer churn dataset, where each row is one customer, to predict
whether the customer is going to stop using the company products.

Below is the sample data stored in the `files.churn` table.

```sql
+----------+------+-------------+-------+----------+------+------------+----------------+---------------+--------------+------------+----------------+-----------+-----------+---------------+--------------+----------------+-------------------------+--------------+------------+-----+
|customerID|gender|SeniorCitizen|Partner|Dependents|tenure|PhoneService|MultipleLines   |InternetService|OnlineSecurity|OnlineBackup|DeviceProtection|TechSupport|StreamingTV|StreamingMovies|Contract      |PaperlessBilling|PaymentMethod            |MonthlyCharges|TotalCharges|Churn|
+----------+------+-------------+-------+----------+------+------------+----------------+---------------+--------------+------------+----------------+-----------+-----------+---------------+--------------+----------------+-------------------------+--------------+------------+-----+
|7590-VHVEG|Female|0            |Yes    |No        |1     |No          |No phone service|DSL            |No            |Yes         |No              |No         |No         |No             |Month-to-month|Yes             |Electronic check         |29.85         |29.85       |No   |
|5575-GNVDE|Male  |0            |No     |No        |34    |Yes         |No              |DSL            |Yes           |No          |Yes             |No         |No         |No             |One year      |No              |Mailed check             |56.95         |1889.5      |No   |
|3668-QPYBK|Male  |0            |No     |No        |2     |Yes         |No              |DSL            |Yes           |Yes         |No              |No         |No         |No             |Month-to-month|Yes             |Mailed check             |53.85         |108.15      |Yes  |
|7795-CFOCW|Male  |0            |No     |No        |45    |No          |No phone service|DSL            |Yes           |No          |Yes             |Yes        |No         |No             |One year      |No              |Bank transfer (automatic)|42.3          |1840.75     |No   |
|9237-HQITU|Female|0            |No     |No        |2     |Yes         |No              |Fiber optic    |No            |No          |No              |No         |No         |No             |Month-to-month|Yes             |Electronic check         |70.7          |151.65      |Yes  |
+----------+------+-------------+-------+----------+------+------------+----------------+---------------+--------------+------------+----------------+-----------+-----------+---------------+--------------+----------------+-------------------------+--------------+------------+-----+
```

Where:

| Column             | Description                                                                                                             | Data Type           | Usage   |
| ------------------ | ----------------------------------------------------------------------------------------------------------------------- | ------------------- | ------- |
| `CustomerId`       | The identification number of a customer.                                                                                | `character varying` | Feature |
| `Gender`           | The gender of a customer.                                                                                               | `character varying` | Feature |
| `SeniorCitizen`    | It indicates whether the customer is a senior citizen (`1`) or not (`0`).                                               | `integer`           | Feature |
| `Partner`          | It indicates whether the customer has a partner (`Yes`) or not (`No`).                                                  | `character varying` | Feature |
| `Dependents`       | It indicates whether the customer has dependents (`Yes`) or not (`No`).                                                 | `character varying` | Feature |
| `Tenure`           | Number of months the customer has been staying with the company.                                                        | `integer`           | Feature |
| `PhoneService`     | It indicates whether the customer has a phone service (`Yes`) or not (`No`).                                            | `character varying` | Feature |
| `MultipleLines`    | It indicates whether the customer has multiple lines (`Yes`) or not (`No`, `No phone service`).                         | `character varying` | Feature |
| `InternetService`  | Customer’s internet service provider (`DSL`, `Fiber optic`, `No`).                                                      | `character varying` | Feature |
| `OnlineSecurity`   | It indicates whether the customer has online security (`Yes`) or not (`No`, `No internet service`).                     | `character varying` | Feature |
| `OnlineBackup`     | It indicates whether the customer has online backup (`Yes`) or not (`No`, `No internet service`).                       | `character varying` | Feature |
| `DeviceProtection` | It indicates whether the customer has device protection (`Yes`) or not (`No`, `No internet service`).                   | `character varying` | Feature |
| `TechSupport`      | It indicates whether the customer has tech support (`Yes`) or not (`No`, `No internet service`).                        | `character varying` | Feature |
| `StreamingTv`      | It indicates whether the customer has streaming TV (`Yes`) or not (`No`, `No internet service`).                        | `character varying` | Feature |
| `StreamingMovies`  | It indicates whether the customer has streaming movies (`Yes`) or not (`No`, `No internet service`).                    | `character varying` | Feature |
| `Contract`         | The contract term of the customer (`Month-to-month`, `One year`, `Two year`).                                           | `character varying` | Feature |
| `PaperlessBilling` | It indicates whether the customer has paperless billing (`Yes`) or not (`No`).                                         | `character varying` | Feature |
| `PaymentMethod`    | Customer’s payment method (`Electronic check`, `Mailed check`, `Bank transfer (automatic)`, `Credit card (automatic)`). | `character varying` | Feature |
| `MonthlyCharges`   | The monthly charge amount.                                                                                              | `money`             | Feature |
| `TotalCharges`     | The total amount charged to the customer.                                                                               | `money`             | Feature |
| `Churn`            | It indicates whether the customer churned (`Yes`) or not (`No`).                                                        | `character varying` | Label   |

<Info>
**Labels and Features**

A **label** is a column whose values will be predicted (the y variable in simple
linear regression).

A **feature** is a column used to train the model (the x variable in simple
linear regression).

</Info>

## Training a Predictor

Let's create and train the machine learning model. For that, we use the
[`CREATE MODEL`](/sql/create/model) statement and specify the
input columns used to train `FROM` (features) and what we want to
`PREDICT` (labels).

```sql
CREATE MODEL mindsdb.customer_churn_predictor
FROM files
  (SELECT * FROM churn)
PREDICT Churn;
```

We use all of the columns as features, except for the `Churn` column, whose
values will be predicted.

## Status of a Predictor

A predictor may take a couple of minutes for the training to complete. You can
monitor the status of the predictor by using this SQL command:

```sql
DESCRIBE customer_churn_predictor;
```

If we run it right after creating a predictor, we get this output:

```sql
+------------+
| status     |
+------------+
| generating |
+------------+
```

A bit later, this is the output:

```sql
+----------+
| status   |
+----------+
| training |
+----------+
```

And at last, this should be the output:

```sql
+----------+
| status   |
+----------+
| complete |
+----------+
```

Now, if the status of our predictor says `complete`, we can start making
predictions!

## Making Predictions

### Making a Single Prediction

You can make predictions by querying the predictor as if it were a table. The
[`SELECT`](/sql/api/select/) statement lets you make predictions for the label
based on the chosen features.

```sql
SELECT Churn, Churn_confidence, Churn_explain
FROM mindsdb.customer_churn_predictor
WHERE SeniorCitizen=0
AND Partner='Yes'
AND Dependents='No'
AND tenure=1
AND PhoneService='No'
AND MultipleLines='No phone service'
AND InternetService='DSL';
```

On execution, we get:

```sql
+-------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Churn | Churn_confidence    | Churn_explain                                                                                                                                                    |
+-------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Yes   | 0.7752808988764045  | {"predicted_value": "Yes", "confidence": 0.7752808988764045, "anomaly": null, "truth": null, "probability_class_No": 0.4756, "probability_class_Yes": 0.5244}    |
+-------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```

To get more accurate predictions, we should provide as much data as possible in
the `WHERE` clause. Let's run another query.

```sql
SELECT Churn, Churn_confidence, Churn_explain
FROM mindsdb.customer_churn_predictor
WHERE SeniorCitizen=0
AND Partner='Yes'
AND Dependents='No'
AND tenure=1
AND PhoneService='No'
AND MultipleLines='No phone service'
AND InternetService='DSL'
AND Contract='Month-to-month'
AND MonthlyCharges=29.85
AND TotalCharges=29.85
AND OnlineBackup='Yes'
AND OnlineSecurity='No'
AND DeviceProtection='No'
AND TechSupport='No'
AND StreamingTV='No'
AND StreamingMovies='No'
AND PaperlessBilling='Yes'
AND PaymentMethod='Electronic check';
```

On execution, we get:

```sql
+-------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Churn | Churn_confidence    | Churn_explain                                                                                                                                                    |
+-------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Yes   | 0.8202247191011236  | {"predicted_value": "Yes", "confidence": 0.8202247191011236, "anomaly": null, "truth": null, "probability_class_No": 0.4098, "probability_class_Yes": 0.5902}    |
+-------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```

MindsDB predicted the probability of this customer churning with confidence of
around 82%. The previous query predicted it with confidence of around 79%. So
providing more data improved the confidence level of predictions.

### Making Batch Predictions

Also, you can make bulk predictions by joining a data table with your predictor
using [`JOIN`](/sql/api/join).

```sql
SELECT t.customerID, t.Contract, t.MonthlyCharges, m.Churn
FROM files.churn AS t
JOIN mindsdb.customer_churn_predictor AS m
LIMIT 100;
```

On execution, we get:

```sql
+----------------+-------------------+------------------+---------+
| customerID     | Contract          | MonthlyCharges   | Churn   |
+----------------+-------------------+------------------+---------+
| 7590-VHVEG     | Month-to-month    | 29.85            | Yes     |
| 5575-GNVDE     | One year          | 56.95            | No      |
| 3668-QPYBK     | Month-to-month    | 53.85            | Yes     |
| 7795-CFOCW     | One year          | 42.3             | No      |
| 9237-HQITU     | Month-to-month    | 70.7             | Yes     |
+----------------+-------------------+------------------+---------+
```

## What's Next?

Have fun while trying it out yourself!

- Bookmark [MindsDB repository on GitHub](https://github.com/mindsdb/mindsdb).
- Sign up for a free [MindsDB account](https://cloud.mindsdb.com/register).
- Engage with the MindsDB community on
  [Slack](https://mindsdb.com/joincommunity) or
  [GitHub](https://github.com/mindsdb/mindsdb/discussions) to ask questions and
  share your ideas and thoughts.

If this tutorial was helpful, please give us a GitHub star
[here](https://github.com/mindsdb/mindsdb).
